77 research outputs found
On some provably correct cases of variational inference for topic models
Variational inference is a very efficient and popular heuristic used in
various forms in the context of latent variable models. It's closely related to
Expectation Maximization (EM), and is applied when exact EM is computationally
infeasible. Despite being immensely popular, current theoretical understanding
of the effectiveness of variaitonal inference based algorithms is very limited.
In this work we provide the first analysis of instances where variational
inference algorithms converge to the global optimum, in the setting of topic
models.
More specifically, we show that variational inference provably learns the
optimal parameters of a topic model under natural assumptions on the topic-word
matrix and the topic priors. The properties that the topic word matrix must
satisfy in our setting are related to the topic expansion assumption introduced
in (Anandkumar et al., 2013), as well as the anchor words assumption in (Arora
et al., 2012c). The assumptions on the topic priors are related to the well
known Dirichlet prior, introduced to the area of topic modeling by (Blei et
al., 2003).
It is well known that initialization plays a crucial role in how well
variational based algorithms perform in practice. The initializations that we
use are fairly natural. One of them is similar to what is currently used in
LDA-c, the most popular implementation of variational inference for topic
models. The other one is an overlapping clustering algorithm, inspired by a
work by (Arora et al., 2014) on dictionary learning, which is very simple and
efficient.
While our primary goal is to provide insights into when variational inference
might work in practice, the multiplicative, rather than the additive nature of
the variational inference updates forces us to use fairly non-standard proof
arguments, which we believe will be of general interest.Comment: 46 pages, Compared to previous version: clarified notation, a number
of typos fixed throughout pape
Center-based Clustering under Perturbation Stability
Clustering under most popular objective functions is NP-hard, even to
approximate well, and so unlikely to be efficiently solvable in the worst case.
Recently, Bilu and Linial \cite{Bilu09} suggested an approach aimed at
bypassing this computational barrier by using properties of instances one might
hope to hold in practice. In particular, they argue that instances in practice
should be stable to small perturbations in the metric space and give an
efficient algorithm for clustering instances of the Max-Cut problem that are
stable to perturbations of size . In addition, they conjecture that
instances stable to as little as O(1) perturbations should be solvable in
polynomial time. In this paper we prove that this conjecture is true for any
center-based clustering objective (such as -median, -means, and
-center). Specifically, we show we can efficiently find the optimal
clustering assuming only stability to factor-3 perturbations of the underlying
metric in spaces without Steiner points, and stability to factor
perturbations for general metrics. In particular, we show for such instances
that the popular Single-Linkage algorithm combined with dynamic programming
will find the optimal clustering. We also present NP-hardness results under a
weaker but related condition
Learning using Local Membership Queries
We introduce a new model of membership query (MQ) learning, where the
learning algorithm is restricted to query points that are \emph{close} to
random examples drawn from the underlying distribution. The learning model is
intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where
the queries are allowed to be arbitrary points).
Membership query algorithms are not popular among machine learning
practitioners. Apart from the obvious difficulty of adaptively querying
labelers, it has also been observed that querying \emph{unnatural} points leads
to increased noise from human labelers (Lang and Baum, 1992). This motivates
our study of learning algorithms that make queries that are close to examples
generated from the data distribution.
We restrict our attention to functions defined on the -dimensional Boolean
hypercube and say that a membership query is local if its Hamming distance from
some example in the (random) training data is at most . We show the
following results in this model:
(i) The class of sparse polynomials (with coefficients in R) over
is polynomial time learnable under a large class of \emph{locally smooth}
distributions using -local queries. This class also includes the
class of -depth decision trees.
(ii) The class of polynomial-sized decision trees is polynomial time
learnable under product distributions using -local queries.
(iii) The class of polynomial size DNF formulas is learnable under the
uniform distribution using -local queries in time
.
(iv) In addition we prove a number of results relating the proposed model to
the traditional PAC model and the PAC+MQ model
Almost Optimal Stochastic Weighted Matching With Few Queries
We consider the {\em stochastic matching} problem. An edge-weighted general
(i.e., not necessarily bipartite) graph is given in the input, where
each edge in is {\em realized} independently with probability ; the
realization is initially unknown, however, we are able to {\em query} the edges
to determine whether they are realized. The goal is to query only a small
number of edges to find a {\em realized matching} that is sufficiently close to
the maximum matching among all realized edges. This problem has received a
considerable attention during the past decade due to its numerous real-world
applications in kidney-exchange, matchmaking services, online labor markets,
and advertisements.
Our main result is an {\em adaptive} algorithm that for any arbitrarily small
, finds a -approximation in expectation, by
querying only edges per vertex. We further show that our approach leads
to a -approximate {\em non-adaptive} algorithm that also
queries only edges per vertex. Prior to our work, no nontrivial
approximation was known for weighted graphs using a constant per-vertex budget.
The state-of-the-art adaptive (resp. non-adaptive) algorithm of Maehara and
Yamaguchi [SODA 2018] achieves a -approximation (resp.
-approximation) by querying up to edges per
vertex where denotes the maximum integer edge-weight. Our result is a
substantial improvement over this bound and has an appealing message: No matter
what the structure of the input graph is, one can get arbitrarily close to the
optimum solution by querying only a constant number of edges per vertex.
To obtain our results, we introduce novel properties of a generalization of
{\em augmenting paths} to weighted matchings that may be of independent
interest
- …